Search CORE

5 research outputs found

Heuristic Splitting of Source Code Identifiers

Author: Madani Nioosha
Publication venue
Publication date: 01/04/2010
Field of study

RÉSUMÉ La maintenance regroupe l’ensemble des activités effectuées pour modifier un logiciel après sa mise en opérations. La maintenance est la phase la plus coûteuse du développement logiciel. La compréhension de programmes est une activité cognitive qui repose sur la construction de représentations mentales à partir des artefacts logiciels. Les développeurs passent un temps considérable à lire et comprendre leurs programmes avant d’effectuer des changements. Une documentation claire et concise peut aider les développeurs à inspecter et à comprendre leurs programmes. Mais, l'un des problèmes majeurs que rencontrent les développeurs durant la maintenance est que la documentation est souvent obsolète ou tout simplement pas disponible. Par conséquent, il est important de rendre le code source plus lisible, par exemple en insistant auprès des développeurs pour qu’ils ajoutent des commentaires dans leur code et respectent des règles syntaxiques et sémantiques en écrivant les identificateurs des concepts dans leurs programmes. Mais certains identificateurs sont constitués de termes des mots qui sont abrégés ou transformés. La reconnaissance des termes composants les identificateurs n'est pas une tâche facile, surtout lorsque la convention de nommage n'est pas respectée. À notre connaissance deux familles d’approches existent pour décomposer les identificateurs : la plus simple considère l’utilisation du renommage et la présence des séparateurs explicites. La stratégie la plus complète est implémentée par l’outil Samurai (Enslen, Hill et al. 2009), elle se base sur le lexique et utilise les algorithmes gloutons pour identifier les mots qui constitue les identificateurs. Samurai est une technique qui considère que si un identificateur est utilisé dans une partie du code, il est probablement utilisé dans le même contexte que son code d’origine (root). Toutefois, les approches mentionnées ci-dessus ont leurs limites. Premièrement,elles sont pour la plus part incapables d’associer des sous chaînes d’identifiants à des mots ou des termes; comme par exemple, des termes spécifiques à un domaine ou des mots de la langue anglais. Ces associations pourrait être utile pour comprendre le degré d’expressivité des termes décrit dans le code source par rapport aux artefacts de haut niveau qui leurs sont associés (De Lucia, Di Penta et al. 2006). Deuxièmement, ils sont incapables de prendre en compte les transformations de mots, tel que l’abréviation de pointeur en pntr. Notre approche est inspirée de la technique de reconnaissance de la parole. La décomposition que nous proposons est basée sur une version modifiée de l’algorithme Dynamic Time Warping (DTW) proposé par Herman Ney pour la reconnaissance de la parole (Ney 1984) et sur une métrique qui est la distance de Levenshtein (Levenshtein 1966). Elle a été développée dans le but de traiter les limitations des approches existantes surtout celles qui consistent en la segmentation des identificateurs contenant des abréviations et à la gestion des transformations des mots du dictionnaire. L’approche proposée a été appliquée à des identificateurs extraits de deux programmes différents : JHotDraw et Lynx. Les résultats obtenus ont été comparés aux oracles construits manuellement et également à ceux d’un algorithme de "splitting" basé sur la casse. Les résultats obtenus ont révélé que notre approche a un aspect nondéterministe relatif à l’établissent des méthodes de transformation appliquées et aux mots du dictionnaire et aux choix des mots du dictionnaire qui subissent ces transformations. Ils montrent que l'approche proposée à de meilleurs résultats que celle basée sur la casse. En particulier, pour le programme Lynx, le Camel Case Splitting n’a été en mesure de décomposer correctement qu’environ 18% des identificateurs, contrairement à notre approche qui a été capable de décomposer 93% des identificateurs. En ce qui concerne JHotDraw, le Camel Case splitter a montré une exactitude de 91% tandis que notre approche a assuré 96% de résultats corrects.----------ABSTRACT Maintenance is the most costly phase of software life cycle. In industry, the maintenance cost of a program is estimated at over 50% of its total life cycle costs (Sommerville 2000). Practical experience with large projects has shown that developers still face difficulties in maintaining their program (Pigoski 1996). Studies (Corbi 1989) have shown that over half of this maintenance is devoted to understanding the program itself. Program comprehension is therefore essential. Program comprehension is a cognitive activity that relies on the construction of mental representations from software artifacts. Comprehension is more difficult for source code (Takang, Grubb et al. 1996). Several tools for nderstanding have been developed (Storey 2006); these tools range from simple visual inspection of the text (such as the explorers of code) to the dynamic analysis of program performance through program execution. While many efforts focus on automating the understanding of programs, a significant part of this work must still be done manually, such as: analyzing the source code, technical reports, and documentation. A clear and concise documentation can help developers to inspect and understand their programs. Unfortunately, one of the major problems faced by developers, during maintenance, is that documentation is often outdated, or not available. Indeed, developers are often concerned about time and costs constraints,neglecting to update the documentation of different versions of their programs. In the source code, identifiers and comments are key means to support developers during their understanding and maintenance activities. Indeed, identifiers are often composed of terms reflecting domain concepts. Usually, identifiers are built by considering a set of rules for choosing the character sequence. Some identifiers are composed of terms that are abbreviated and transformed of the words. Recognizing the terms in the identifiers is not an easy task when naming convention is not used. In this thesis we will use a technique inspired from speech recognition, Dynamic Time Warping and meta-heuristic algorithms, to split identifiers into component terms. We propose a novel approach to identify terms composing identifiers that is organized in the following steps: A dictionary of English words is built and will be our source of words. We take an identifier and look through the dictionary to find terms that are exactly contained in the identifier. For each word of a dictionary, we will compute the distance between word and the input identifier. For terms that exactly exist in both dictionary and identifier, the distance is zero and we obtain an exact splitting of the identifier and the process terminate successfully. Other words of the dictionary with non-zero distance may indicate that the identifier is built from terms that are not exactly in the dictionary and some modification should be applied on the words. Some words of the dictionary have more characters than the terms in the identifier. Some transformations such as deleting all vowels or deleting some characters will be applied on the words of the dictionary. The modification of the words is applied in the context of a Hill Climbing search. For each new transformed word, we will calculate its distance to the input identifier via Dynamic Time Warping (DTW). If the recently created word reduces the global minimum distance then we add that word to the current dictionary otherwise another transformation is applied. We will continue these steps until we reach to the distance of zero or the character number of the dictionary word become less than three or all the possible transformation have been applied. The identifier is split with words such that their distances are zero or have the lowest distance between other words of the dictionary. To analyze the proposed identifier splitting approach, with the purpose of evaluating its ability to adequately identify dictionary words composing identifiers, even in presence of word transformations, we carried out a case study on two software systems, JHotDraw and Lynx. Results based on manually-built oracles indicate that the proposed approach outperforms a simple Camel Case splitter. In particular, for Lynx, the Camel Case splitter was able to correctly split only about 18% of identifiers versus 93% with our approach, while on JHotDraw, the Camel Case splitter exhibited a correctness of 91%, while our approach ensured 96% of correct results. Our approach was also able to map abbreviations to dictionary words, in 44% and 70% of cases for JHotDraw and Lynx, respectively. We conclude that DTW, Hill Climbing and transformations are useful to split identifiers into words and propose future directions or research

PolyPublie

Application of Tabu Search to Scheduling Trucks in Multiple Doors Cross-Docking Systems

Author: Madani Nioosha
Publication venue
Publication date: 01/03/2013
Field of study

RÉSUMÉ : Cette recherche focus sur l’amélioration des cross-dockings en vue d’augmenter les niveaux de performance du service et de réduire les coûts. l’algorithme de la recherche avec tabous est étudiée pour trouver la séquence optimale d’entrée et sortie des remorques au cross-docking. L’objectif de cette recherche est de maximiser le nombre total de transferts directs entre le fournisseur et une destination finale commune de livraison. Dans les stratégies de distribution actuelles, l’objectif est de synchroniser les chaines du fabricant et du client. Le cross-docking implique de recevoir les produits d’un fournisseur pour plusieurs clients et d’occasionnellement consolider cela avec les produits d’autres fournisseurs pour des destinations finales de livraison communes. En résumé, l’approche examinée dans cette recherche donne une occasion significative pour l’amélioration des opérations au Cross-docking par la réduction du stockage des produits.----------ABSTRACT : Today’s supply chain management performance has been affected by continuously increasing pressure of market forces. The pressure of market includes demands on increased flow of products and throughput with less amount of storage, also customers demand for more products with lower operational costs and more value-added services provided to customers. Supply chain is responsible in cost reduction and service levels increase by providing transshipments across its members. However supply chain has to face fluctuations of demands with the short available lead times. Physical problem of warehouse limitations and also inventory costs and shipping affect the performance of supply chain. In today’s distribution strategies, the main goal is to provide a synchronization of customer chains and the suppliers. The objective is to reduce the inventory buffering between customers and suppliers. The idea of cross-docking is to receive different goods from a manufacturer for several end destinations and possibly consolidate the goods with other manufacturer’s items for common final customers; then ship them in the earliest possible time. The focus of this research effort is to improve cross-dock operations with the goal of increasing the service performance levels and reducing costs. Specifically, metaheuristics algorithm of Tabu search is investigated for finding optimal sequence of incoming and outgoing trailers at cross-docks. This thesis reviews available research literature on cross-dock operations. Tabu search for the truck scheduling problem is presented along with results. Tabu search algorithm is investigated for the truck scheduling problem in the multiple doors cross-docking with unknown incoming and outgoing sequences. The objective of this research is to maximize the total direct transfer of products from a supplier to common final delivery destinations. The algorithm is implemented in C++ and analyzed using different problem instances. The results gained from algorithm of Tabu search are compared with other iterative heuristic descent method. The results indicate that the Tabu search performs significantly better than the descent method for large problem instances. In general, the results present that a metaheuristic algorithm of Tabu search for multiple or single door cross-docking offers thelargest potential for improvement. In summary, the approach explored in this research offers significant opportunity to improve cross-dock operations through reducing storage of products

PolyPublie

Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques

Author: Giuliano Antoniol
Latifa Guerrouj
Massimiliano Di Penta
Nioosha Madani
Yann-gaël Guéhéneuc
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Abstract—The existing software engineering literature has empirically shown that a proper choice of identifiers influences software understandability and maintainability. Researchers have noticed that identifiers are one of the most important source of information about program entities and that the semantic of identifiers guide the cognitive process. Recognizing the words forming identifiers is not an easy task when naming conventions (e.g., Camel Case) are not used or strictly followed and–or when these words have been abbreviated or otherwise transformed. This paper proposes a technique inspired from speech recognition, i.e., dynamic time warping, to split identifiers into component words. The proposed technique has been applied to identifiers extracted from two different applications: JHotDraw and Lynx. Results compared to manually-built oracles and with Camel Case algorithm are encouraging. In fact, they show that the technique successfully recognize words composing identifiers (even when abbreviated) in about 90 % of cases and that it performs better than Camel Case. Furthermore, it was able to spot mistakes in the manually-built oracle. Keywords—Source code identifiers; program comprehension. I

CiteSeerX

Crossref

PolyPublie